Self-organising
Data Mining There are very different data mining tools available and many papers are published describing data mining techniques. We think that it is most important for a more sophisticated data mining technique to limit the user involvement in the entire data mining process to the inclusion of well-known a priori knowledge. This makes the process more automated and more objective. Most users' primary interest is in generating useful and valid model results without having to have extensive knowledge of mathematical, cybernetic and statistical techniques or sufficient time for complex dialog driven modelling tools. Soft computing, i.e., Fuzzy Modelling, Neural Networks, Genetic Algorithms and other methods of automatic model generation, is a way to mine data by generating mathematical models from empirical data more or less automatically. In the past years there has been much publicity about the ability of Artificial Neural Networks to learn and to generalize despite important problems with design, development and application of Neural Networks:
A self-organising data mining creates optimal complex models systematically and autonomously by employing both parameter and structure identification. An optimal complex model is a model that optimally balances model quality on a given learning data set ("closeness of fit") and its generalisation power on new, not previously seen data with respect to the data's noise level and the task of modelling (prediction, classification, modelling, etc.). It thus solves the basic problem of experimental systems analysis of systematically avoiding "overfitted" models based on the data's information only. This makes self-organising data mining a most automated, fast and very efficient supplement and alternative to other data mining methods. The differences between Neural Networks and this new approach focus on Statistical Learning Networks and induction. The first Statistical Learning Network algorithm of this new type, the Group Method of Data Handling (GMDH), was developed by A.G. Ivakhnenko in 1967. Considerable improvements were introduced in the 1970s and 1980s by versions of the Polynomial Network Training algorithm (PNETTR) by Barron and the Algorithm for Synthesis of Polynomial Networks (ASPN) by Elder when Adaptive Learning Networks and GMDH were flowing together. Further enhancements of the GMDH algorithm have been realized in the "KnowledgeMiner" software described and enclosed in this book.
This book provides a thorough introduction to self-organising data mining technologies for business executives, decision makers and specialists involved in developing Executive Information Systems (EIS) or in modelling, data mining or knowledge discovery projects. It is a book for working professionals in many fields of decision making: Economics (banking, financing, marketing), business oriented computer science, ecology, medicine and biology, sociology, engineering sciences and all other fields of modelling of ill-defined systems. Each chapter includes some practical examples and a reference list for further reading. The accompanying diskette/internet download contains the KnowledgeMiner Demo version and several executable examples. This book offers a comprehensive view to all major issues related to self-organising data mining and its practical application for solving real-world problems. It gives not only an introduction to self-organising data mining, but provides answers to questions like:
Why Data
Mining is needed models make it possible:
Therefore mathematical modeling formed the core of almost all decision support systems. Models can be derived from existing theory (theory driven approach or theoretical systems analysis) and/or from data (data driven approach or experimental systems analysis). a. Theory-driven approach For complex ill-defined systems, such as economic, ecological, social, biological a.o. systems, we have insufficient a priori knowledge about the relevant theory of the system under research. Modeling based on a theory driven approach is considerably affected by the fact that the modeler often has to know things about the system that are generally impossible to find. This concerns uncertain a priori information with regard to the selection of the model structure (factors of influence and functional relations) as well as insufficient knowledge about interference factors (actual interference factors and factors of influence which can not be measured). According to this, insufficient a priori information concerns the required a priori knowledge on the object under research be connected to:
In order to overcome these problems and to deal with ill-defined systems and, in particular, with insufficient a priori knowledge, there is a need to find ways on how it is possible, with the help of emergent information engineering, to reduce the time and resource intensive model formation process required before one can start initial task solving. Computer-aided design of mathematical models may soon prove as highly valuable in bridging the gap. b. Data-driven approach Modern information technologies delivers a flood of data and there is a question how to leverage them. Commonly, statistically based principles are used for model formation. But with them there is always the need to have a priori knowledge about the structure of the mathematical model. In addition to the epistemological problems of commonly used statistical principles of model formation, there are several methodological problems which may arise in conjunction with the insufficience of a priori information. This indeterminacy of the starting position marked by the subjectivity and incompletedness of the theoretical knowledge and an insufficient data basis leads to several methodological problems. Knowledge discovery from data and specifically data mining techniques and tools can assist humans in analyzing the mountains of data and to turn information located in the data into successful decision making. Data mining includes not just a single analytical technique but many methods and techniques depending on the nature of the enquiry. These methods contain data visualization, tree-based methods and methods of mathematical statistics as well as those for knowledge extraction from data using self-organizing modeling to turn information located in the data into successful decision making. Data mining is an interactive and iterative process of numerous subtasks and decisions such as data selection and pre-processing, choice and application of data mining algorithms and analysis of the extracted knowledge. Most important for a more sophisticated data mining application is to try to limit the involvement of users in the overall data mining process to the inclusion of existing a priori knowledge while making this process more automated and more objective. Automatic model generation like GMDH, Analog Complexing, and Fuzzy Rule Induction is based on these demands and provides sometimes the only way to generate models of ill-defined problems. |
Contact:
Research
Date Last Modified: 06/18/00